A phase vocoder is a type of vocoder which can scale both the frequency and time domains of audio signals by using phase information. The computer algorithm allows frequency-domain modifications to a digital sound file (typically time expansion/compression and pitch shifting).
At the heart of the phase vocoder is the short-time Fourier transform (STFT), typically coded using fast Fourier transforms. The STFT converts a time domain representation of sound into a time-frequency representation (the "analysis" phase), allowing modifications to the amplitudes or phases of specific frequency components of the sound, before resynthesis of the frequency domain representation into the time domain by the inverse STFT. The time evolution of the resynthesized sound can be changed by means of modifying the time position of the STFT frames prior to the resynthesis operation allowing for time-scale modification of the original sound file.
Contents |
The main problem that has to be solved for all case of manipulation of the STFT is the fact that individual signal components (sinusoids, impulses) will be spread over multiple frames and multiple STFT frequency locations (bins). This is because the STFT analysis is done using overlapping analysis windows. The windowing results in spectral leakage such that the information of individual sinusoidal components is spread over adjacent STFT bins. To avoid border effects of tapering of the analysis windows STFT analysis windows overlap in time. This time overlap results in the fact that adjacent STFT analysis are strongly correlated (a sinusoid present in analysis frame at time "t" will be present in the subsequent frames as well). The problem of signal transformation with the phase vocoder is related to the problem that all modifications that are done in the STFT representation need to preserve the appropriate correlation between adjacent frequency bins (vertical coherence) and time frames (horizontal coherence). Besides for extremely simple synthetic sounds these appropriate correlations can only be preserved approximately and since the invention of the phase vocoder the research was mainly concerned with finding algorithms that would preserve the vertical and horizontal coherence of the STFT representation after the modification. For time scaling operations amplitude coherence is only a minor problem because shifting analysis frames in time has only a minor impact on the amplitude. The phase coherence problem has been tackled for quite a while before appropriate solutions have emerged.
The phase vocoder was introduced in 1966 by Flanagan as an algorithm that would preserve horizontal coherence between the phases of bins that represent sinusoidal components.[1] This original phase vocoder did not take into account the vertical coherence between adjacent frequency bins, and therefore, time stretching with this system did produce sound signals that were missing clarity.
The optimal reconstruction of the sound signal from STFT after amplitude modifications has been proposed by Griffin and Lim in 1984.[2] This algorithm does not consider the problem to produce a coherent STFT, but it allows to find the sound signal that has an STFT that is as close as possible to the modified STFT even if the modified STFT is not coherent (does not represent any signal).
The problem of the vertical coherence remained a major issue for the quality of time scaling operations until 1999 when the Laroche and Dolson[3] proposed a rather simple means to preserve phase consistency across spectral bins. The proposition of Laroche and Dolson has to be seen as a turning point in phase vocoder history. It has been shown that by means of ensuring vertical phase consistency very high quality time scaling transformations can be obtained.
The algorithm proposed by Laroche did not allow to preserve horizontal phase coherence for sound onsets (note onsets). A solution for this problem has been proposed by Roebel.[4]
A software implementation of the phase vocoder based signal transformation that is using means similar to what has been described here above to achieve high quality signal transformation is for example Ircam's SuperVP.[5]
British composer Trevor Wishart used phase vocoder analyses and transformations of a human voice as the basis for his composition VOX 5 (part of his larger VOX Cycle).[6] Transfigured Wind by American composer Roger Reynolds uses the phase vocoder to perform time-stretching of flute sounds.[7]
The proprietary Auto-Tune pitch-correcting software, widely used in commercial music production, is based on the phase vocoder principle.
|